75 research outputs found
Zero-Shot Learning by Convex Combination of Semantic Embeddings
Several recent publications have proposed methods for mapping images into
continuous semantic embedding spaces. In some cases the embedding space is
trained jointly with the image transformation. In other cases the semantic
embedding space is established by an independent natural language processing
task, and then the image transformation into that space is learned in a second
stage. Proponents of these image embedding systems have stressed their
advantages over the traditional \nway{} classification framing of image
understanding, particularly in terms of the promise for zero-shot learning --
the ability to correctly annotate images of previously unseen object
categories. In this paper, we propose a simple method for constructing an image
embedding system from any existing \nway{} image classifier and a semantic word
embedding model, which contains the \n class labels in its vocabulary. Our
method maps images into the semantic embedding space via convex combination of
the class label embedding vectors, and requires no additional training. We show
that this simple and direct method confers many of the advantages associated
with more complex image embedding schemes, and indeed outperforms state of the
art methods on the ImageNet zero-shot learning task
Toward Online Measurement of Decision State
In traditional perceptual decision-making experiments, two pieces of data are collected on each trial: response time and accuracy. But how confident were participants and how did their decision state evolve over time? We asked participants to provide a continuous readout of their decision state by moving a cursor along a sliding scale between a 100% certain left response and a 100% certain right response. Subjects did not terminate the trials; rather, trials were timed out at random and subjects were scored based on the cursor position at that time. Higher rewards for correct responses and higher penalties for errors were associated with extreme responses so that the response with the highest expected value was that which accurately reflected the participant's odds of being correct. This procedure encourages participants to expose the time-course of their evolving decision state. Evidence on how well they can do this will be presented
Building high-level features using large scale unsupervised learning
We consider the problem of building high-level, class-specific feature
detectors from only unlabeled data. For example, is it possible to learn a face
detector using only unlabeled images? To answer this, we train a 9-layered
locally connected sparse autoencoder with pooling and local contrast
normalization on a large dataset of images (the model has 1 billion
connections, the dataset has 10 million 200x200 pixel images downloaded from
the Internet). We train this network using model parallelism and asynchronous
SGD on a cluster with 1,000 machines (16,000 cores) for three days. Contrary to
what appears to be a widely-held intuition, our experimental results reveal
that it is possible to train a face detector without having to label images as
containing a face or not. Control experiments show that this feature detector
is robust not only to translation but also to scaling and out-of-plane
rotation. We also find that the same network is sensitive to other high-level
concepts such as cat faces and human bodies. Starting with these learned
features, we trained our network to obtain 15.8% accuracy in recognizing 20,000
object categories from ImageNet, a leap of 70% relative improvement over the
previous state-of-the-art
- …